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Abstract — Energy disaggregation, also known as non- 
intrusive load monitoring (NILM), is the task of separating 
aggregate energy data for a whole building into the energy 
data for individual appliances. Studies have shown that simply 
providing disaggregated data to the consumer improves energy 
consumption behavior. However, placing individual sensors on 
every device in a home is not presently a practical solution. 
Disaggregation provides a feasible method for providing energy 
usage behavior data to the consumer which utilizes currently 
existing infrastructure. In this paper, we present a novel 
framework to perform the energy disaggregation task. We 
model each individual device as a single-input, single-output 
system, where the output is the power consumed by the device 
and the input is the device usage. In this framework, the task 
of disaggregation translates into finding inputs for each device 
that generates our observed power consumption. We describe 
an implementation of this framework, and show its results on 
simulated data as well as data from a small-scale experiment. 

I, INTRODUCTION 

This paper is motivated by the need of efficient energy 
management solutions for the retail distribution domain of 
smart grid. Here, under the term retail distribution domain, 
we mean the interactions between the local distributors, e.g. 
the utility companies, and customers, e.g. building occupants. 
Usage of smart energy management devices has enabled new 
functionalities and has brought the potential for increased 
energy efficiency via real-time control and monitoring. 

Currently, we focus on commercial and residential build- 
ings. Commercial and residential buildings are major users 
of energy in the developed world. Buildings account for 20- 
40% of total energy consumption [1]. We seek to provide 
customers with individual device power consumption infor- 
mation. Studies have shown that simply providing such data 
improves the consumer's energy consumption behavior [2]. 

Current monitoring methods measure total consumption 
for a building. Placing individual sensors on every device in 
a home is not presently a practical solution. Disaggregation, 
also known as non-intrusive load monitoring (NILM), is 
the task of separating aggregate energy data for a whole 
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building into the component energy data for individual 
devices, e.g. refrigerators, stovetops, washing machines, &c. 
Disaggregation provides a feasible method for providing 
energy usage behavior data to the consumer, thereby allowing 
them to identify behavioral trends or device malfunctions that 
lead to inefficiencies, without requiring major infrastructural 
changes such as the addition of individual sensors on each 
device or power receptacle. 

Outside of informing consumers about ways to improve 
energy efficiency, disaggregation presents an opportunity for 
utility companies to strategically market products to con- 
sumers. It is now common practice for companies to monitor 
our online activity and then present advertisements which 
are targeted to our interests. This is known as 'personalized 
advertising' . Disaggregation of energy data provides a means 
to similarly market products to consumers. This leads to the 
question of user privacy and the question of ownership with 
regards to power consumption information. Treatment of the 
issue of consumer privacy in the smart grid is outside the 
scope of this paper. However, this is discussed in [3]. 

Additionally, disaggregation also presents opportunities 
for improved control. Many devices, such as heating, ven- 
tilation, and air conditioning (HVAC) units in residential 
and commercial buildings implement control policies that 
are dependent on real-time measurements. Disaggregation 
can provide information to controllers about system faults, 
such as device malfunction, which may result in inefficient 
control. It can also provide information about energy usage 
which is informative for demand response programs. 

We focus on designing disaggregation methods by using 
dynamical models of devices and formulating the disag- 
gregation problem in an optimal control framework. By 
working within the dynamical systems and optimal control 
framework, we hope that our algorithms will lend themselves 
to easy integration into current real-time optimal control of 
smart devices within the buildings and for facilitating the 
implementation of flexible demand response mechanisms by 
utilities. We designed and set up an experiment to collect 
data which we use for disaggregation. 

The rest of the paper proceeds as follows. In Section [II] 
we discuss the relevant background and existing literature. In 



Section III we describe our dynamical system framework for 



disaggregation, and implementation methods. In Section IV 



we test our implementation on simulated data and show 
results. In Section [VJ we describe the experimental setup 
for collecting energy data and discuss the results of the 
proposed disaggregation method on data from a small-scale 



experiment. In Section VI we make concluding remarks and 
discuss future work. 



II. BACKGROUND 

The problem of non-intrusive load monitoring and the 
existing hardware for non-intrusive load monitoring has been 
studied extensively in the literature (see [4], [5]). The general 
consensus is that non-intrusive load monitoring is a method 
to present the consumer with information that makes them 
aware of their usage and potentially provides them insight 
into how to improve the efficiency of their usage. Further, 
the technology to perform non-intrusive load monitoring is 
becoming widely available. Hence, there is a need for flexible 
and efficient disaggregation algorithms. 

Disaggregation of energy data has emerged as one possible 
solution for identifying consumer behavior patterns and 
device malfunctions which lead to inefficient usage of energy. 
The goal of the current disaggregation literature is to present 
methods for improving energy monitoring at the consumer 
level without having to place sensors at device level, but 
rather use existing sensors at the whole building level. The 
concept of disaggregation is not new; however, only recently 
has it gained attention in the energy research domain. This 
is likely due to the emergence of smart meters and big data 
analytics. 

Broadly speaking, disaggregation in essence is a single- 
channel source separation problem. The problem of recov- 
ering the components of an aggregate signal is an inverse 
problem and as such is, in general, ill-posed. Most dis- 
aggregation algorithms are batch algorithms and produce 
an estimate of the disaggregated signals given a batch of 
aggregate recordings. There have been a number of survey 
papers summarizing the existing methods (e.g. see [6], [7]). 
In an effort to be as self-contained as possible, we try to 
provide a broad overview of the existing methods and then 
explain how the disaggregation method presented in this 
paper differs from existing solutions. 

The literature can be divided into two main approaches, 
namely, supervised and unsupervised. Supervised disaggre- 
gation methods require a disaggregated data set for training. 
This data set could be obtained by, for example, monitoring 
typical appliances using plug sensors. Supervised methods 
assume that the variations between signatures for the same 
type of appliances is less than that between signatures of 
different types of appliances. Hence, the disaggregated data 
set does not need to be from the building that the supervised 
algorithm is designed for. However, the disaggregated data 
set must be collected prior to deployment, and come from 
appliances of a similar type to those in the target building. 
Supervised methods are typically discriminative. 

Unsupervised methods, on the other hand, do not require 
a disaggregated data set to be collected. They do, however, 
require hand tuning of parameters, which can make it hard 
for the methods to be generalized in practice. It should be 
said that also supervised methods have tuning parameters, 
but these can often be tuned using the training data. 

The existing supervised methods include sparse coding [8], 
change detection and clustering based approaches [9], [10] 
and pattern recognition [11]. The sparse coding approach 



tries to reconstruct the aggregate signal by selecting as few 
signatures as possible from a library of typical signatures. 
Similarly, in our proposed framework we construct a library 
of dynamical models and reconstruct the aggregate signal by 
using as few as possible of these models. 

The existing unsupervised methods include factorial hid- 
den Markov models (HMMs), difference hidden Markov 
models and variants [12], [13], [14], [15], [16] and temporal 
motif mining [17]. Most unsupervised methods models the 
on/off sequences of appliances using some variation of 
HMMs. These methods do not make use of the signature of a 
device and assume that the power consumption is piecewise 
constant. 

All method we are aware of lack the use of the dynamics 
of the devices. While the existing supervised methods often 
do use device signatures, these methods are discriminative 
and an ideal method would have a dynamical model that is 
capable to generating a device signature given a combination 
of initial state and input. Both HMMs and linear dynamical 
models are generative as opposed to discriminative, mak- 
ing them more advantageous for modeling complex system 
behavior. In the unsupervised domain, HMMs are used; 
however, they are not estimated using data and they do not 
model the signature of a device. The method we develop 
in this paper will combine the use of a generative model, 
i.e. linear dynamical models of devices, with a supervised 
approach to disaggregation. 

III. DYNAMICAL MODELS 

A. Framework 

In our dynamical model framework, we model individual 
devices as single-input, single-output systems, where the 
output is the power consumed by the device and the input is 
the device usage. That is, the input is zero if a device is off, 
and the input is nonzero if the device is on. Thus, for device 
i, we have a model of the following form: y, = hi(ui), where 
y t is the power consumption signal of the device, Ui is the 
input to the device, and hi is a function that represents the 
underlying dynamics. We build a library of models which 
represent the appliance types we are interested in. 

With a model for each device, we treat the total power 
consumption as the aggregate output of all devices, i.e. 
y = ^2 i — 1 yu where y is the total power consumption 
signal and D is the total number of device models. The 
task of disaggregation then translates into finding inputs for 
each device that generates our observed power consumption. 
In general, this solution will not be unique without more 
constraints on the input. Incorporating some prior on the 
form of the input, the problem becomes the following: 

arg min^ u L(y, y m ) + g(u) 
subj.to iji = hi(ui) 

fori €{1,...,D} 

y = z^=i Vi 



(i) 



where y m is the measured power consumption, y is the 
estimated power consumption, L is a loss function penalizing 



deviations of y from y m , and g is a regularization on the input 
that incorporates our priors. 

B. Implementation 

In this framework, the task of disaggregation can be 
broken down into two steps: system identification and dis- 
aggregation. 

1) System identification: In the system identification step, 
we seek to build a library of models which represent all the 
devices we are interested in. We assume we are given time- 
series measurements of power consumption for individual 
devices, e.g. a toaster, a kettle, or a LCD projector, and we 
wish to find a model to capture the dynamics underlying 
the signal. This task has a deep history and well-established 
literature and results [18]. 

More specifically, for some device i, we are given T power 
usage samples, yi[k] £ E for k £ {1, . . . , T}, and a sequence 
of corresponding inputs, Ui[k] for k £ {1, ... ,T}. Assuming 
our world is causal, our goal is to find a satisfactory model 
such that yi[k] = h^ k (ui[l\, . . .,%[£]). 

Throughout this paper, we will use linear time-invariant 
(LTI) state-space models to represent the power consumption 
dynamics of individual devices, i.e. systems of the form: 



Xi[k + 1] 

Vi[k] 
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■ biUi[k] 



(2) 



where n is the order of the device model and Xi[k] £ M" 
for k = 1, . . . ,T is a state underlying the dynamics. The 
framework generalizes to nonlinear, time-varying models as 
well, but for simplicity we merely consider the LTI case here. 

Note that, under the assumption that similar devices have 
similar power consumption profiles, these models can be 
estimated offline. That is, for the task of disaggregation, 
we only need to estimate models for each class of devices 
once. Afterward, due to their generative nature, these models 
can be used for any household. Thus, this dynamical system 
framework would be cost-effective to deploy widely. 

Furthermore, while power usage data can be easily 
recorded with plug sensors, it is not as convenient to record 
the input signal, Ui[-] for each plug. Thus, at this step, it may 
be necessary to apply blind system identification techniques, 
i.e. techniques for the case where both the system dynamics 
and the inputs are unknown. A detailed coverage of blind 
system identification is outside the scope of this paper; we 
refer the interested reader to the following references: [19], 
[20]. Also, the authors of this paper have also devised a 
method for blind system identification motivated by the 
disaggregation problem, see [21]. 

2) Disaggregation: With these dynamical models in hand, 
we can treat disaggregation as the task of finding an input 
that generates our observed output. The problem formulation 
is as follows. We are only given samples of aggregated power 
consumption for a household, y m [k] £ R. for k = 1, . . . ,T. 
Also, we know that the majority of the power consumption 
signal originates from a subset of our D modeled devices. 
We want to find inputs which result in a similar power 
consumption signal. 



In this paper, we take the inputs of the system to be the 
device's setting when it is on. Take a conventional oven as an 
example. It can be off, or it could be on with a temperature 
setting that takes on continuous values. In this situation, the 
input is zero if the oven is off, or the input is the temperature 
setting if the oven is on. An important distinction is that the 
input is the temperature setting, not the temperature of the 
oven itself; the input can be thought of as a command to 
the device, e.g. if a user sets the oven to 350°F at time k*, 
the input is u ven[^] = for k < k* and u 0V en[&] = 350 
for k > k* . Looking at this example, we can see that a 
reasonable prior would be that the inputs Ui are piecewise 
constant, and that the changes in m across time are sparse. 
Throughout our paper, we use this as our prior on the inputs. 

Returning to Equation [T] we define: 



Au = 



«[1] 

«[2] 



«[0] 
«[1] 



i[T\ - u[T - 1\ 



(3) 



and we take g(x) — card(x), i.e. the number of nonzero 
elements in x. Furthermore, we take L to be the Euclidean 
distance on R T . Thus, we have our optimization defined. 

A common approach when one is trying to minimize the 
cardinality of a vector is to relax the cardinality into the l\ 
norm, which is convex. However, we found that this performs 
poorly in our framework. A likely explanation is that when 
a linear system is converted in the linear operator Am h^ y, 
it will often fail to meet the desiderata for the £i relaxation. 

Another technique is necessary. First, we note that if we 
know which elements of Au are nonzero, i.e. which devices 
turned on or off at what time, then it is easy to find the op- 
timal Au. We define each of these as a configuration. When 
g(-) is the cardinality operator, the optimal configuration is, 
informally, the configuration which results in the best fit with 
the fewest nonzero entries. However, finding this solution is 
combinatorial. 

We seek relaxations which will make this optimization 
tractable. We assume that, at each time step, only one device 
turns on or off at a time. This is not an egregious assumption 
if our sampling rate is sufficiently large. Also, we assume 
that the devices switch on and off in sequence; a device 
does not turn on and then on again afterward. We can sort by 
time and place our possible configurations in a tree structure. 
More formally, at each time step, one of D + 1 things can 
happen: a device d £ {1, . . . , D} switches on or off, where 
only one of the two options is possible depending on its 
current configuration, or no device changes configuration. 
This induces a hierarchical ordering on configurations of 
different time intervals. That is, at depth T of the tree, the 
nodes are configurations at times k £ {1, . . . , T}, and that 
node's children are configurations on {1, . . . , T + 1}. 

If we think of the configuration at a given time as a 
mode, then this is a hybrid system estimation problem. The 
combinatorial problem above is often called a complete filter 
bank. This is still intractable, but we can use heuristics to 



intelligently prune or merge the tree and keep the set of 
possible configurations manageable. For the general problem, 
pruning and merging methods are discussed in [22], [23], 
[24]. These methods are known as generalized pseudo- 
Bayesian filters or interacting multiple models. Also, note 
that these algorithms allow for disaggregation to be done 
online. 

The disaggregation problem allows for several intuitive 
heuristics. First, if a given configuration continues to model 
the future data well, we assume no device changes state. Sec- 
ond, if the power consumption increases by a certain amount, 
a device is turning on. Finally, if the power consumption 
decreases by a significant amount, a device is turning off. 
These three heuristics are sufficient to make our optimization 
problem extremely efficient. 

IV. SIMULATION 
We implemented the disaggregation algorithm on simu- 
lated data. We generated D — 5 third-order single input, 
single output systems using MATLAB's drss function, 
normalized to have a DC gain of 1. The step responses for 
these 5 systems can be seen in Figure [T] Let the dynamics 
of each system be represented with matrices Ai,bi,c] , di for 
ie {1, . . . , D}. We assume we are given the true models for 
each of these D devices. 

step response for simulated devices 

1.5 1 1 1 1 1 1 1 i 

device 1 

device 2 

: device 3 - 

device 4 
device 5 





60 80 100 120 140 160 

time 



Fig. 1. The step responses of D ■ 
spaced apart by 30 time steps. 



5 randomly generated device models, 



We also observed that many real-life devices seem to have 
different dynamics when switching on and when switching 
off. For example, consider the root-mean-squre (RMS) cur- 
rent signal of a toaster, represented in Figure [2] There is 
overshoot when the toaster switches on, but the off dynamics 
do not show the same behavior. In fact, in all of the devices 
we measured, we found that when a device turns off, the 
power drops to a negligible amount almost instantly. We 
factor this into our simulated models as well. 

Then, we created output signals for each system by using 
the inputs in EquationH] These inputs were chosen to overlap 
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Fig. 2. The measured RMS current signal for a toaster. Note that the 
on-switches display overshoot while the off-switches do not. 



significantly. Also, not every device is activated during the 
simulation. The aggregated signal was created by summing 
these individual inputs as well as white noise with mean 
and 0.02 standard deviation. 

ui[k] = 1.2 for k e {20, . . . , 100} 

u 2 [k] = 2 for fee {130, ...,400} 

u 3 [k] = 0.6 for fee {180,..., 300} (4) 

u 4 [k] = 1.8 for k £ {250, . . . , 350} 

Ui[k] — otherwise 

We then run the disaggregation method described in Section 
III-B For simplicity, we assume that the input is zero 



initially. Then, as long as this configuration's expected output 
and the observed output are within a certain threshold, we 
keep the same configuration. When the observed output ex- 
ceeds this threshold, we determine if the signal is increasing 
or decreasing. If it is increasing, we look at all devices and 
nearby times to find the device that best explains the change 
in the measured data, as well as nearby data afterward, when 
driven with a constant input. If it is decreasing, since all 
devices turn off in the same fashion, we determine which 
device turned off by looking at the contribution of each 
device in the estimated configuration. 

More formally, let y m be our measured signal, and let y 
be the predicted output under the estimated configuration. 
Suppose we detect a change at time k* and let N be our 
lookahead time. Then, for nearby times k' and devices i £ 
{1, . . . , D} which are not currently on, we calculate: 



unvi,x t ,u ||e- Villi 
subj. to Xi[k + 1] = -Ajajjffe] + biU 

for k £ {k 1 , k' + 2, . . . , k* + N - 1} 
Xi[k'} = 

yi[k] = cfxi[k] + diU 
for k £ {k', ...,k*+N} 



(5) 



where e[k] = y m [k] — y[k] for k e {k', ...,k* + N} is the 
deviation we need to explain. Note here that u is a scalar, 
not a time-dependent signal. That is, given k' and i, we find 
the best input magnitude to explain the behavior. Also, note 
that we are implicitly reducing the cardinality of Am, as 
well as reducing the number of needed calculations, by only 
making these estimations when our estimated configuration 
is not satisfactory. Furthermore, if we wish to do online 
disaggregation, the lookahead parameter, N, determines how 
much delay is needed. The disaggregation estimate is: 



fii[&] = 1.2017 for k e {20, . . . , 100} 

u 2 [k] = 2.0104 for fee {130,..., 400} 

u 3 [k] = 0.5827 for k € {180, . . . , 300} 

Ui[k] = 1.7987 for k G {250, . . . , 350} 

Ui[k] — otherwise 



(6) 



Every device is successfully identified, and the switching 
times are also correctly identified. The simulated data is 
plotted against the estimated data in Figure [3] 
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Fig. 3. The simulated disaggregation results. 



V. EXPERIMENT 

For the verification of our disaggregation method, we 
deployed a small-scale experiment. To collect the data, we 
use the emonTx wireless open-source energy monitoring 
node from OpenEnergyMonitoifM We use current transformer 
(CT) sensors and an alternating current (AC) to AC power 
adapter to measure the current and voltage respectively of 
the devices that we monitored. For each device we measure 
the root-mean-square (RMS) current (i^Ms)' RMS voltage 
(Vj^ MS ), apparent power (Py A ), real power (P^), power 
factor (#p f ), and a UTC time stamp where the superscript 
i index denotes the zth device. The sampling rate is 12Hz. 

For our experiment, we focused on small devices that 
would be featured in a residential or commercial office 

http : //openenergymonitor . org/emon/emontx 



building. First, we took individual plug-level measurements 
for a kettle, a toaster, a projector, a monitor, and a microwave. 
These devices consume anywhere from 70W to 1800W. We 
labeled the devices {1,...,5}, respectively. For the blind 
system identification of each of these devices, we used a 
simple change detection algorithm to generate input signals. 
Then, we fit autoregressive models with exogenous inputs. 

Then, we ran an experiment where we had a microwave, 
a toaster, and a kettle (devices 5, 2, and 1, respectively) 
operating at different time intervals. These individual plug 
measurements are in Figure [4] We can note that the device 
power consumptions are not completely independent; one de- 
vice turning on can affect the power consumption of another 
device. However, we found this effect to be negligible in our 
disaggregation algorithms. 

Then, we ran an experiment where we had a microwave, 
a toaster, and a kettle (devices 5, 2, and 1, respectively) 
operating at different time intervals. These individual plug 
measurements are in Figure HI We can note that the device 
power consumptions are not completely independent; one de- 
vice turning on can affect the power consumption of another 
device. However, we found this effect to be negligible in our 
disaggregation algorithms. 

individual plug measurements 



- microwave 

- toaster 

- kettle 



50 100 150 200 250 300 350 400 450 500 
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Fig. 4. The measurements of individual plug RMS currents. 

The results from using our disaggregation method on the 
experimental data is presented in Figure B] The estimated 
power consumption lines up with the measured power con- 
sumption quite well. Furthermore, the power consumption of 
the toaster and the kettle are correctly identified. However, 
the microwave is erroneously identified as a monitor. This 
is because the dynamics of these two models are quite 
similar. This error can easily be compensated for by setting 
a maximum power consumption for each device. That is, 
we can state a priori that we know an LCD monitor will 
not draw over 10 amps of RMS current. When we add this 
prior, the microwave becomes correctly labeled. 

Examining the data, we can see that methods which do 



estimated plug measurements 
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where we collect measurements from more devices in an 
actual residential setting. In this experimental setting, we 
hope to learn not only device dynamics, but also the user's 
consumption patterns. One of the benefits of our framework 
is that we can learn devices independent of the consumer, 
and then learn the user's consumption patterns. Note that 
in many unsupervised methods, keeping the device constant 
while varying the consumer's usage patterns would result in 
different models entirely. 

Additionally, throughout our experiments, we noticed that 
some devices do not fit our current modeling assumptions 
neatly. For example, the microwave warms up for a second 
or two, and begins heating. This results in two successive 
jumps in power consumption. With our current modeling 
assumptions, the best fit is an over-damped system. This is 
not ideal, and we hope to model devices as hybrid systems 
with multiple modes in the future. 



Fig. 5. The estimated power consumption signals of each device. 



not take into account the dynamics of the devices, such 
as the hidden Markov models (HMM) methods in [13], 
[14], will likely confuse the kettle and toaster, which have 
similar amplitudes and can have similar durations. Also, the 
sparse coding method in [8] requires a large training data 
set to serve as a dictionary; here, we have a very small 
training set from which we derive system models. Thus, a 
direct comparison between our method and the sparse coding 
method is not possible. 

VI. CONCLUSIONS AND FUTURE WORK 

In this paper, we present a novel framework to perform 
the task of disaggregation. We treat individual devices as 
systems and try to find the inputs which create the observed 
aggregated signal. This framework differs largely from the 
current disaggregation literature, which focuses largely on 
unsupervised methods. In contrast, our framework leverages 
many techniques and methods in system identification, opti- 
mal control, and hybrid system estimation. 

We firmly believe that accounting for the power consump- 
tion profiles of individual devices will significantly improve 
disaggregation results. In a unsupervised setting, creating 
such models is very difficult. However, under the assumption 
that similar devices have similar power consumption profiles, 
the cost of collecting data and estimating these models is 
not significant. Thus, our framework, which utilizes more 
data than completely unsupervised methods, would not be 
infeasible to implement widely. 

We tested an implementation of our framework on simu- 
lated data, as well as data from a small-scale experiment. The 
simulated data is completely recovered, and the experimental 
results closely matched the ground truth, although we did not 
achieve exact recovery. However, adding some reasonable 
assumptions allowed us to completely recover the ground 
truth. 

For future work, we plan on deploying an experiment 
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